-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add support for external engines #84
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the planner able to distinguish when it should and shouldn't use these remote rules?
hoptimator-k8s/src/main/java/com/linkedin/hoptimator/k8s/K8sEngine.java
Outdated
Show resolved
Hide resolved
hoptimator-k8s/src/main/java/com/linkedin/hoptimator/k8s/K8sEngineTable.java
Show resolved
Hide resolved
String name = engine.engineName() + "-" + inTrait.database(); | ||
JdbcConvention outTrait = JdbcConvention.of(dialect, inTrait.expression, name); | ||
|
||
System.out.println("Registering rules for " + name + " using dialect " + dialect.toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use logger or remove (applicable to other lines in this file as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
Yes, via four mechanisms:
Eventually we may want to add more details to the Internally, we can install different engines that target different databases, e.g. Trino can target offline while Flink targets nearline. |
Summary
This adds support for installing remote query engines, e.g. Trino, DuckDB, or Flink SQL Gateway.
k8s.engines
metadata table.RemoteTableScan
,RemoteJoin
, associated optimizer rules.Details
The Hoptimator JDBC Driver is able to talk to remote
Databases
, but it previously relied on Calcite'sEnumerable
engine to process queries locally. For example, joining tables in two differentDatabases
would involve first fetching the rows from each table and then joining locally in the driver itself.With
Engines
, we can outsource these operations to fast, distributed query engines like Trino. Queries are sent off to the remote engine, and the Driver simply collects the results.Testing
Without an
Engine
installed, a query must be processed locally via theEnumerable
convention:The
EnumerableNestedLoopJoin
would be very slow for large datasets.After installing an engine, we see that the query plan now involves a
RemoteJoin
instead:The
RemoteJoin
is able to leverage Trino or similar distributed query engines.